Picture for Huawen Shen

Huawen Shen

ChartArena: Benchmarking Chart Parsing across Languages, Scenarios, and Formats

Add code
May 31, 2026
Viaarxiv icon

PhoneWorld: Scaling Phone-Use Agent Environments

Add code
May 28, 2026
Viaarxiv icon

Towards Real-World Document Parsing via Realistic Scene Synthesis and Document-Aware Training

Add code
Mar 25, 2026
Viaarxiv icon

MMTIT-Bench: A Multilingual and Multi-Scenario Benchmark with Cognition-Perception-Reasoning Guided Text-Image Machine Translation

Add code
Mar 25, 2026
Viaarxiv icon

Does the Question Really Matter? Training-Free Data Selection for Vision-Language SFT

Add code
Mar 10, 2026
Viaarxiv icon

Gather and Trace: Rethinking Video TextVQA from an Instance-oriented Perspective

Add code
Aug 06, 2025
Viaarxiv icon

Beyond Cropped Regions: New Benchmark and Corresponding Baseline for Chinese Scene Text Retrieval in Diverse Layouts

Add code
Jun 05, 2025
Figure 1 for Beyond Cropped Regions: New Benchmark and Corresponding Baseline for Chinese Scene Text Retrieval in Diverse Layouts
Figure 2 for Beyond Cropped Regions: New Benchmark and Corresponding Baseline for Chinese Scene Text Retrieval in Diverse Layouts
Figure 3 for Beyond Cropped Regions: New Benchmark and Corresponding Baseline for Chinese Scene Text Retrieval in Diverse Layouts
Figure 4 for Beyond Cropped Regions: New Benchmark and Corresponding Baseline for Chinese Scene Text Retrieval in Diverse Layouts
Viaarxiv icon

Char-SAM: Turning Segment Anything Model into Scene Text Segmentation Annotator with Character-level Visual Prompts

Add code
Dec 27, 2024
Figure 1 for Char-SAM: Turning Segment Anything Model into Scene Text Segmentation Annotator with Character-level Visual Prompts
Figure 2 for Char-SAM: Turning Segment Anything Model into Scene Text Segmentation Annotator with Character-level Visual Prompts
Figure 3 for Char-SAM: Turning Segment Anything Model into Scene Text Segmentation Annotator with Character-level Visual Prompts
Figure 4 for Char-SAM: Turning Segment Anything Model into Scene Text Segmentation Annotator with Character-level Visual Prompts
Viaarxiv icon

LDP: Generalizing to Multilingual Visual Information Extraction by Language Decoupled Pretraining

Add code
Dec 19, 2024
Viaarxiv icon

Track the Answer: Extending TextVQA from Image to Video with Spatio-Temporal Clues

Add code
Dec 17, 2024
Figure 1 for Track the Answer: Extending TextVQA from Image to Video with Spatio-Temporal Clues
Figure 2 for Track the Answer: Extending TextVQA from Image to Video with Spatio-Temporal Clues
Figure 3 for Track the Answer: Extending TextVQA from Image to Video with Spatio-Temporal Clues
Figure 4 for Track the Answer: Extending TextVQA from Image to Video with Spatio-Temporal Clues
Viaarxiv icon